Skip to content

Conversation

@zadzanl
Copy link

@zadzanl zadzanl commented Nov 1, 2025

Description

Adds instruction-aware query prefixes for DeepInfra embedding models to
improve semantic search accuracy. Changes include:

  • Added queryPrefix support for Qwen3-Embedding models (0.6B, 4B, 8B) with code
    search instruction format
  • Added queryPrefix for intfloat/multilingual-e5-large-instruct model
  • Added queryPrefix for google/embeddinggemma-300m with task-specific format
  • Added queryPrefix for BAAI/bge-large-en-v1.5 with passage retrieval format
  • Reduced MAX_ITEM_TOKENS from 8191 to 512 for compatibility with models
    that have 512 token limits

Test Procedure

  • Confirmed MAX_ITEM_TOKENS reduction is applied in index.ts
  • Tested getModelQueryPrefix() function returns correct prefixes for each model
  • Validated existing functionality remains working for all models

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue.
  • Scope: My changes are focused on the linked issue.
  • Self-Review: I have performed a thorough self-review of my code.
  • Documentation Impact: I have considered if my changes require
    documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

  • No documentation updates are not required.
  • Yes, documentation updates are required.

Additional Notes

The query prefixes are model-specific and follow the recommended format
from each model's documentation. This change is backward compatible
and won't affect existing implementations that don't use these specific models.

Get in Touch

Discord: badgambit


Important

Adds instruction-aware query prefixes for embedding models and adjusts token limits for compatibility.

  • Behavior:
    • Adds instruction-aware query prefixes for Qwen3-Embedding models (0.6B, 4B, 8B), intfloat/multilingual-e5-large-instruct, google/embeddinggemma-300m, and BAAI/bge-large-en-v1.5 in embeddingModels.ts.
    • Reduces MAX_ITEM_TOKENS from 8191 to 511 in index.ts for model compatibility.
  • Tests:
    • Adds tests in openai-compatible.spec.ts for DeepInfra provider detection and handling, including encoding format and response processing.
    • Validates getModelQueryPrefix() function returns correct prefixes for each model.
  • Misc:
    • Updates OpenAICompatibleEmbedder in openai-compatible.ts to handle different encoding formats based on provider type.

This description was created by Ellipsis for e9f2e0c. You can customize this summary. It will automatically update as commits are pushed.

CommitGambit and others added 8 commits October 31, 2025 19:38
Update branch to latest - 31 Oct 2025
…ld index

Added support for DeepInfra-hosted embedding models and fix a critical bug where
the 'type' field index was missing in Qdrant, causing "Bad Request" errors
during code search operations.

Changes:
- Added DeepInfra provider detection in OpenAICompatibleEmbedder
  * Detect DeepInfra URLs (deepinfra.com)
  * Use 'float' encoding format for DeepInfra, 'base64' for other standard
    providers
  * Handle both float array and base64 string embedding responses
  * Added validation for embedding values (NaN/Infinity checking)

- Fix missing Qdrant payload index for 'type' field
  * Non-existing `type` field causes "Bad Request" during `codebase_search`
    tool invocation
  * Create keyword index for 'type' field to support metadata filtering
  * Resolves "Index required but not found for 'type' field" error

- Added 7 DeepInfra embedding model profiles:
  * Qwen/Qwen3-Embedding-0.6B (1024 dims)
  * Qwen/Qwen3-Embedding-4B (2560 dims)
  * Qwen/Qwen3-Embedding-8B (4096 dims)
  * intfloat/multilingual-e5-large-instruct (1024 dims)
  * google/embeddinggemma-300m (768 dims)
  * BAAI/bge-m3 (1024 dims)
  * BAAI/bge-large-en-v1.5 (1024 dims)

- Added some test coverage for DeepInfra
  * Provider validation
  * Encoding format tests
  * Float array and base64 response handling tests
  * Configuration validation tests

Tested with: embeddinggemma-300m, text-embedding-004, multilingual-e5-large
feat: add DeepInfra embedding support and fix missing Qdrant `type` index
…dels

- Add queryPrefix support for Qwen3-Embedding models (0.6B, 4B, 8B)
- Add queryPrefix for intfloat/multilingual-e5-large-instruct
- Add queryPrefix for google/embeddinggemma-300m
- Add queryPrefix for BAAI/bge-large-en-v1.5
- Reduce MAX_ITEM_TOKENS from 8191 to 512 for compatibility with models that have 512 token limits (e5-large, bge-large-en-v1.5)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@zadzanl zadzanl requested review from cte, jr and mrubens as code owners November 1, 2025 19:41
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 1, 2025
@zadzanl zadzanl closed this Nov 1, 2025
@roomote
Copy link

roomote bot commented Nov 1, 2025

See this task on Roo Code Cloud

Found 1 issue that needs to be addressed:

  • Add test coverage for query prefix functionality in openai-compatible.spec.ts

Mention @roomote in a comment to trigger your PR Fixer agent and make changes to this pull request.

@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Nov 1, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Nov 1, 2025
@dosubot dosubot bot added the enhancement New feature or request label Nov 1, 2025
)
})
})
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query prefix functionality added in this PR (lines 114-137) lacks test coverage. While DeepInfra provider detection and encoding format are tested, there are no tests verifying that getModelQueryPrefix() prefixes are actually being applied to queries. Consider adding tests that verify: (1) prefixes are correctly added for models that require them, (2) double-prefixing is prevented, and (3) texts that would exceed MAX_ITEM_TOKENS after prefixing are handled appropriately.

@zadzanl zadzanl changed the title Feature/instruction aware embeddings sync Feature/instruction aware embeddings Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants